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Abstract 

We investigate the problem of winner determination from computational social choice 
theory in the data stream model. Specifically, we consider the task of summarizing an 
arbitrarily ordered stream of n votes on m candidates into a small space data structure so 
as to be able to obtain the winner determined by popular voting rules. As we show, finding 
the exact winner requires storing essentially all the votes. So, we focus on the problem of 
finding an e-winner, a candidate who could win by a change of at most e fraction of the 
votes. We show non-trivial upper and lower bounds on the space complexity of e-winner 
determination for several voting rules, including /.--approval, k-ve to, scoring rules, approval, 
maximin, Bucklin, Copeland, and plurality with run off. 


1 Introduction 


A common and natural way to aggregate preferences of agents is through an election. In a 
typical election, we have a set of m candidates and a set of n voters, and each voter reports 
his ranking of the candidates in the form of a vote. A voting rule selects one candidate as the 
winner once all voters provide their votes. Determining the winner of an election is one of the 
most fundamental problems in social choice theory. 

We consider elections held in an online setting where voters vote in arbitrary order, and we 
would like to find the winner at any point in time. A very natural scenario where this occurs is an 
election conducted over the Internet. For instance, websites often ask for rankings of restaurants 
in a city and would like to keep track of the “best” restaurant according to some fixed voting 
rule. Traditionally, social choice theory addresses settings where the number of candidates is 
much smaller than the number of voters. However, we now often have situations where both 
the candidate set and voter set are very large. For example, the votes may be the result of 
high-frequency measurements made by sensors in a network [26], and a voting rule could be 
used to aggregate the measurements (as argued in [I?]). Also, in online participatory democracy 
systems, such as I widl. svnfl . the number of candidates can be as large as the number of voters. 
The naive way to conduct an online election is to store all the vote counts in a database and 
recompute the winner whenever it is needed. The space complexity of this approach becomes 
infeasible if the number of candidates or the number of votes is too large. Can we do better? Is 
it possible to compress the votes into a short summary that still allows for efficient recovery of 
the winner? 


This question can be naturally formulated in the data stream model [3[ [ioh . Votes are inter¬ 
preted as items in a data stream, and the goal is to devise an algorithm with minimum space 
requirement to determine the election winner. In the simplest setting of the plurality voting 
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rule, where each vote is simply an approval for a single candidate and the winner is the one 
who is approved by the most, our problem is closely related to the classic problem of finding 
heavy hitters in a stream. For other popular voting rules, such as Borda, Bucklin or 

Condorcet consistent voting rules, the questions become somewhat different. 

Regardless of the voting rule, if the goal is to recover only the winner and the stream of votes is 
arbitrary, then it becomes essentially impossible to do anything better than the above-mentioned 
naive solution (even when the algorithm is allowed to be randomized). Although we prove this 
formally, the reason should be intuitively clear: the winner may be winning by a very tiny 
margin thereby making every vote significant to the final outcome. We therefore consider a 
natural relaxation of the winner determination problem, where the algorithm is allowed to 
output any candidate who could have been the winner, according to the voting rule under 
consideration, by a change of at most en votes. We call such a candidate an e-winner; similar 
notions were introduced in 125. 361. Note that if the winner wins by a margin of victory [36] of 
more than en, there is a unique e-winner. 

In this work, we study streaming algorithms to solve the (e, <5 )-winner determination prob¬ 
lem, i.e. the task of determining, with probability at least 1 - 5, an e-winner of any given vote 
stream according to popular voting rules. Our algorithms are necessarily randomized. 


1.1 Our Contributions 

We initiate the study of streaming algorithms for the (e,<5)-WiNNER Determination problem 
with respect to various voting rules. The results for the (e, <5 )-Winner Determination prob¬ 
lem, when both e and 5 are positive, are summarized in Table [H (When e or 5 equals 0, we 
prove that the space requirements are much larger.) 

We also exhibit algorithms, having space complexity nearly same as Table [1] for the more 
general sliding window model, introduced by Datar et. al. in [130 - In this setting, for some 
parameter N, we want to find an e-winner with respect to the N most recent votes in the 
stream, clearly a very well motivated scenario in online elections. 


1.2 Related Work 
1.2.1 Social Choice 


To the best of our knowledge, our work is the first to systematically study the approximate 
winner determination problem in the data stream model. A conceptually related work is that of 
Conitzer and Sandholm [10] who study the communication complexity of common voting rules. 
They consider n parties each of whom knows only their own vote but, through a communication 
protocol, would like to compute the winner according to a specific voting rule. Observe that a 
streaming algorithm for exact winner determination using s bits of memory space immediately 
implies a one-way communication protocol where each party transmits s bits. However, it turns 
out that their results only imply weak lower bounds for the space complexity of streaming 
algorithms. Moreover, [10] does not study determination of e-winners. The communication 
complexity of voting rules was also highlighted by Caragiannis and Procaccia in [ 7] ■ 

In a recent work, we |fl5[l studied the problem of determining election winners from a random 
sample of the vote distribution. Since we can randomly sample from a stream of votes using a 
small amount of extra storage, the bounds from [15] are also useful in the streaming context. 


'Each party can input its vote into the stream and then communicate the memory contents of the streaming 
algorithm to the next party. 
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Voting Rule 

Space complexity 

Upper bound 

Lower bound 

Generalized plurality 1 ' 

0 (min{ilogm,mlogi} + log log?t) 

[Theorem [3] and 0 

fi( - log - + - 7 = log nt + log log n) if in > - 

[Theorem® 

fc-veto* 

0 (min{| log m, to log hslutAiil 1 . + loglogn) 

[Theorem 0] 

fi ( 5 r lo g J + l°g m + loglogn) if nt > j 
for every /.t e [0,1) ITheoremll 6 H 

Plurality 

0 (min{ilogm,mlogi} + log logit) 

[Theorem [3] and 0 

jfl(i log i + log m + loglogn) ifnt>j, 

I fi(mlog 7 + loglogn) ifnt<i, 

[Theorem [15] and 11911 
[Theorem l20ll 
[Observation |2]] 

fc-approval* 

0 (min {7 log m,m log log ^ fc ^} + loglogn) 

[Theorem |5]l 

Scoring rules 

0 (m(loglogin + log i) + loglogn) 

[Theorem [5]] 

Approval 

0 (m(loglogin + log j) + loglogn) 

[Theorem 01 

Maximin, Bucklin, 
Run off 

0 (min{nt 2 (log log m + log i), 
p-nt log 2 m} + log log n ) 

[Theorem [7] and 0 

Copeland 

0 (min{nt 2 (log log m + log i), 
jj-nt log 4 m} + log log n) 

[Theorem [7] and 0 


Table 1: Space complexity for the (e, 5) -Winner Determination problem for various voting rules. We 
do not show dependence on 5 in the table for sake of clarity. * : The lower bound results for the 
approval and fc-veto voting rules apply only for k = 0(m 7 ), for every 7 € [0,1). f: For the case m < 
the lower bound is same as that of other rules. $ : Here, each voter has the choice to either approve 
or disapprove of one candidate and the candidate who has the maximum number of approvals minus 
disapprovals wins. 


In that work, the goal was to find the winner who was assumed to have a margin of victory 
of at least e, but the same arguments also work for finding e-winners. 


1.2.2 Streaming 

The held of streaming algorithms has been the subject of intense research over the past two 
decades in both the algorithms and database communities. The theoretical foundations for the 
area were laid by 1131 120(1 . A stream is a sequence of data items 01 , 02 , ■■■ ,cr n , drawn from the 
universe [m\, such that on each pass through the stream, the items are read once in that order. 
The frequency vector associated with the stream / = (/ 1 , f m ) 6 is defined as fj being the 
number of times j occurs as an item in the stream. In this definition, the stream is insertion- 
only; more generally, in the turnstile model, items can both be inserted and deleted from the 
stream, in which case the frequency vector maintains the cumulative count of each element in 
[m]. General surveys of the area can be found in 11321. 33], 

Algorithms for the insertion-only case were discovered before the formulation of the data 
streaming model. Consider the point-query problem: for a stream of n items from a universe of 
size m and a parameter e > 0, the goal is to output, for any item j e \m\, an estimate fj such 
that | fj - fj | < en. Misra and Gries ll3C)ll gav^3 an elegant but simple deterministic algorithm 
requiring only 0(min{m, l/e}-(logm + logn)) space in bit complexity. Since to find an e-winner 


n The algorithm can be viewed as a generalization of the Boyer-Moore [SllZt] algorithm for e = 1/2. It was also 
rediscovered 20 years later by fum. 
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for the plurality voting rule, it’s enough to solve the point query problem and output the j with 
maximum fj, Misra-Gries automatically implies 0(min{m, 1/e} • (log m + logn)) space complex¬ 
ity for plurality. We use sampling to improve the dependence on n and prove tightness in terms 
of e and n. Our algorithms for many of the other voting rules are also based on the Misra-Gries 
algorithm. We note that in place of Misra-Gries, there are several other deterministic algorithms 
which could have been used, such as Lossy Counting Ezh and Space Saving [ 28], but they would 
not change the asymptotic space complexity bounds. A thorough overview of the point query, 
or frequency estimation, problem can be found in lUl . 

For the more general turnstile model, the point query problem for such streams is that of find¬ 


ing fj, for every j, such that \,f) - fj\ < 


The best result for this problem is due to Cor- 


w ' j * ** \ */ j j i 11 " 11 a 

mode and Muthukrishnan, the randomized count-min sketch 111211 . which has space complexity 
0{- log m log n) in bits. The space bound was proved to be essentially tight by Jowhari et al. in 
[21]. In our context, the stream is a sequence of votes; so, our problems are mostly, just by 
definition, insertion-only. However, the count-min sketch becomes useful in our applications 
(i) if voters can issue retractions of their votes, and (ii) to maintain counts of random samples 
drawn from streams of unknown length. 


1.3 Technical Overview 

Upper Bounds. The streaming algorithms that achieve the upper bounds shown in Table |T] 
are obtained through applying frequency estimation algorithms, such as Misra-Gries or count- 
min sketch, appropriately on a subsampled stream. The number of samples needed to obain 
e-winners for the various voting rules was previously analyzed in [15], 


Lower Bounds. Our main technical novelty is in the proofs of the lower bounds for the (e, <5)- 
winner determination problem. Usually, in the “heavy hitters” problem in the algorithms liter¬ 
ature, the task is roughly to determine the set of items with frequency above en. Since there 
can be 1/e such items, a space lower bound of log log(em)) immediately follows for 

m» 1/e. In contrast, we wish to determine only one e-winner, so that just log m bits are needed 
to output the result. In order to obtain stronger lower bounds that depend on e, we need to re¬ 
sort to other techniques. Moreover, note that our lower bounds are in the insertion-only stream 
model, whereas previous lower bounds for frequency estimation problems are usually for the 
more general turnstile model. 

We prove these bounds through new reductions from fundamental problems in communication 
complexity. To give a flavor of the reductions, let us sketch the proof for the plurality voting 
rule. Consider each additive term separately in the lower bound. 


log log n: Suppose Alice has a number 1 < a < n and Bob a number 1 < b < n, and Bob wishes 
to know whether a > h through a protocol where communication is one way from Alice to 
Bob. It is known |2^, 34] that Alice is required to send fl(logn) bits to Bob. We can reduce 
this problem to finding a 1/3-winner in a plurality election among two candidates by having 
Alice push 2 a approvals for candidate 1 into the stream and Bob pushing 2 b approvals for 
candidate 2; the H(loglogn) lower bound follows. 


• (1/e) log(l/e) when m > 1/e: Consider the Indexing problem over an arbitrary alphabet: 
Alice has a vector x e [t] m and Bob an index i e \m\, and Bob wants to find x* through a one¬ 
way protocol from Alice to Bob. Ergiin et al [160, extending &s proof for the case of t = 2, 
show Alice needs to send fl(mlogf) bits. For t = m = l/\fe, we reduce Indexing to e-winner 
determination for a plurality election. Let the candidate set be [t] x \m\. Alice (given her 
input x) pushes n/2 votes into the stream with s/en/2 votes to each (xj,j) for all j e \m\ and 
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sends over the memory content of the streaming algorithm to Bob who (given his input i) 
pushes another n/2 votes into the stream with %/en/2 votes to each ( a,i ) for all a e \t]. Note 
that candidate (x;, i) is the unique yi/4-winner of this plurality election! Using [16fl’s lower 


bound f2(l/^log(l/e)) on the communication complexity of the Indexing problem yields 
our result. 

mlog(l/e) when m < 1/e: Suppose Alice has a vector a e [t] m and Bob a vector b e [t] m , and 
Bob wants to fmc 0 i = argmaxj(aj + bj) through a one-way protocol. We show by reducing 
from the Augmented Indexing problem lfl 6 l 29 1 that Alice needs to send U(mlogf) bits 
to Bob. Suppose t = 1/e. Alice imagines her vector a as being the vote count for a plurality 
election among m candidates, streams in a and runs the streaming algorithm for the problem, 
and passes the memory output to Bob who also streams in his vector b. The maximum 
entry in a + b corresponds to a candidate winning by margin at least e 2 n, hence yielding the 
U(mlog(l/e)) lower bound. 


2 Preliminaries 

2.1 Voting and Voting Rules 

Let V = {v\, ..., v n } be the set of all voters and C = {ci,..., c m } the set of all candidates. If not 
mentioned otherwise, V, C, n and m denote set of voters, the set of candidates, the number of 
voters and the number of candidates respectively. Each voter v£s vote is a complete order >,. 
over the candidate set C. For example, for two candidates a and b, a >ib means that the voter Vi 
prefers a to b. We denote the set of all complete orders over C by C(C). Hence, £(C) n denotes 
the set of all n-voters’ preference profiles (>i,..., > n ). A map r : | C | eN+ £(C)” —» 2 C is called 
a voting rule. Given a vote profile >e C(C) n , we call the candidates in r(>) the winners. Given 
an election £ = (V. C), we can construct a weighted graph Qg, called weighted majority graph, 
from £. The set of vertices in Qg is the set of candidates in £. For any two candidates x and 
y, the weight on the edge ( x,y ) is Dg(x,y) = Ng(x,y ) - Ng(y,x), where Ng(x,y ) (respectively 
Ng(y. x)) is the number of voters who prefer x to y (respectively y to x). A candidate x is called 
the Condorcet winner in an election £ if Dg(x,y) > 0 for every other candidate y 4- x. A voting 
rule is called Condorcet consistent if it selects the Condorcet winner as the winner of the election 
whenever it exists. Some examples of common voting rules are: 

• Positional scoring rules: A collection of m-dimensional vectors s m = («i, « 2 , ■ • ■, o m ) e M m 
with a± > 0(2 >■■■ > atm and a\ > a m for every m e N naturally defines a voting rule - a 
candidate gets score on from a vote if it is placed at the i th position. The score of a candidate 
is the sum of the scores it receives from all the votes. The winners are the candidate with 
maximum score. The vector a that is 1 in the first k coordinates and 0 in other coordinates 
gives the k-approval voting rule. The vector a that is 1 in the last k coordinates and 0 in 
other coordinates is called k-veto voting rule. Observe that the score of a candidate in the 
(■-approval (respectively A;-veto) voting rule is the number of approvals (and respectively 
vetoes) that the candidate receives. 1-approval is called the plurality voting rule, and 1-veto 
is called the veto voting rule. The score vector (m - 1, m - 2,..., 1,0) gives the Borda rule. 

• Generalized plurality: In generalized plurality voting, each voter approves or disapprove 
one candidate. The score of a candidate is the number of approvals it receives minus number 
of disapprovals it receives. The candidates with highest score are the winners. We introduce 
this rule and consider it to be interesting particularly in an online setting where every voter 

111 Assume the maximum is unique. 
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either likes or dislikes an item; hence each vote is either an approval for a candidate or a 
disapproval for a candidate. 

• Approval: In approval voting, each voter approves a subset of candidates. The winners are 
the candidates which are approved by the maximum number of voters. 

• Maximin: The maximin score of a candidate x is min y + x Ds{x,y). The winners are the 
candidates with maximum maximin score. 

• Copeland: The Copeland score of a candidate x is \{y ± x : Dg{x,y) > 0}|. The winners are 
the candidates with maximum Copeland score. 

• Bucklin: A candidate x’s Bucklin score is the minimum number £ such that more than half of 
the voters rank x in their first l positions. The winners are the candidates with lowest Bucklin 
score. 

• Plurality with runoff: The top two candidates according to plurality score are selected first. 
The pairwise winner of these two candidates is selected as the winner of the election. This 
rule is often called the runoff voting rule. 

Among the above, only the maximin and Copeland rules are Condorcet consistent. 

2.2 Model of Input Data 

In the basic model, the input data is an insertion only stream of elements from some universe U. 
We note that, in the context of voting in an online scenario, the natural model of input data is 
the insertion only streaming model over the universe of all possible votes C(C). The basic model 
can be generalized to the more sophisticated sliding window model where the only active items 
are the last n items, for some parameter n. In this work, we focus on winner determination 
algorithms for insertion only stream of votes in both basic and sliding window models. The 
basic input model can also be generalized to another input model, called turnstile model, where 
the input data is a sequence from U x {1,-1}; every element in the stream corresponds to 
either a unit increment or a unit decrement of frequency of some element from U. We will use 
the turnstile streaming model (over some different universe) only to design efficient winner 
determination algorithms for the insertion only stream of votes. We note that, the algorithms 
for the streaming data can make only one pass over the input data. These one pass algorithms 
are also called streaming algorithms. 


2.3 Communication Complexity 


We will use lower bounds on communication complexity of certain functions to prove space 
complexity lower bounds for our problems. Communication complexity of a function measures 
the number of bits that need to be exchanged between two players to compute a function whose 
input is split among those two players [37]. In a more restrictive one-way communication model, 
the first player sends only one message to the second player and the second player outputs the 
result. A protocol is a method that the players follow to compute certain functions of their input. 
Also the protocols can be randomized; in that case, the protocol needs to output correctly with 
probability at least 1 -5, for some parameter 5 e [0,1] (the probability is taken over the random 
coin tosses of the protocol). The randomized one-way communication complexity of a function 
/ with error 5 is denoted by lZ [ ff way {f). Classically the first player is named Alice and the 
second player is named Bob and we also follow the same convention here. 0 is a standard 
reference for communication complexity. 
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2.4 Chernoff Bound 


We will use the following concentration inequality: 

Theorem 1. Let Xi,... ,X^be a sequence of £ independent random variables in [0,1] (not neces¬ 
sarily identical). Let S = Y,i Xi and let p- E [S']. Then, for any 0 < <5 < 1: 

Pr[|S-j«| >5£\ <2exp(-2£<5 2 ) 


and 

PrflS 1 - p\ > dp] < 2exp(-6 2 p/3) 

The first inequality is called an additive bound and the second multiplicative. 

2.5 Problem Definition 

The basic winner determination problem is defined as follows. 

Definition 1. (Winner Determination) 

Given a voting profile > over a set of candidates C and a voting rule r, determine the winners r(>). 

We show a strong space complexity lower bound for the Winner Determination problem for 
the plurality voting rule in Theorem [T21 To overcome this theoretical bottleneck, we focus on 
determining approximate winner of an election. Below we define the notion of e-approximate 
winner which we also call e-winner. 

Definition 2. (e: -winner) 

Given an n-voter voting profile > over a set of candidates C and a voting rule r, a candidate w is 
called an e-winner if w can be made winner by changing at most en votes in >. 

Notice that there always exist an e-winner in every election since a winner is also an e-winner. 
We show that finding even an e-winner deterministically requires large space when the number 
of votes is large [see Theorem [T4l1 . However, we design space efficient randomized algorithms 
which outputs an e-winner of an election with probability at least 1-6. The problem that we 
study here is called (s,<5)-Winner Determination problem and is defined as follows. 

Definitions. ((e,<5)-Winner Determination) 

Given a voting profile > over a set of candidates C and a voting rule r, determine an e-winner with 
probability at least 1-6. (The probability is taken over the internal coin tosses of the algorithm.) 


3 Upper Bounds 

In this section, we present the algorithms for the (e, d)-Winner Determination problem for var¬ 
ious voting rules. Before embarking on specific algorithms, we first prove a few supporting 
results that will be used crucially in our algorithms later. We begin with the following space 
efficient algorithm for picking an item uniformly at random from a universe of size n below. 

Observation 1. There is an algorithm for choosing an item with probability ^ that uses 
O(loglogn) bits ofmemoiyand uses fair coin as its only source of randomness. 
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Proof. First let us assume, for simplicity, that n is a power of 2. We toss a fair coin log 2 n 
many times and choose the item, say x, only if the coin comes head all the times. Hence the 
probability that the item x gets chosen is K We need O(loglogn) space to toss the fair coin 
log 2 n times (to keep track of the number of times we have tossed the coin so far). If n is not 
a power of 2 then, toss the fair coin [ log 2 n] many times and we choose the item x only if the 
coin comes head in all the tosses conditioned on some event E. The event E contains exactly n 
outcomes including the all heads outcome. □ 

We remark that Observation [Tj is tight in terms of space complexity. We state the claim formally 
below, as it may be interesting in its own right. 

Proposition 1. Any algorithm that chooses an item from a set of size n with probability p, for 
0 < p < using a fair coin as its only source of randomness, must use fl(log log n) bits of memoiy. 

Proof The algorithm tosses the fair coin some number of times (the number of times it tosses 
the coin may also depend on the outcome of the previous tosses) and finally picks an item from 
the set. Consider a run TZ of the algorithm where it chooses the item, say x, with smallest 
number of coin tosses ; say it tosses the coin t many times in this run 1Z. This means that in any 
other run of the algorithm where the item x is chosen, the algorithm must toss the coin at least 
t number of times. Let the outcome of the coin tosses in 7 Z be r\,—,rt. Let s t be the memory 
content of the algorithm immediately after it tosses the coin i th time, for i e [f], in the run 7 Z. 
First notice that if t < log 2 n, then the probability with which the item x is chosen is more than 
T, which would be a contradiction. Hence, t > log 2 n. Now we claim that all the sfs must be 
different. Indeed otherwise, let us assume s t = Sj for some i < j. Then the algorithm chooses 
the item x after tossing the coin t - (j - i) (which is strictly less than t ) many times when the 
outcome of the coin tosses are This contradicts the assumption that the run 

7 Z we started with chooses the item x with smallest number of coin tosses. □ 

An essential ingredient in our algorithms is calculating the approximate frequencies of all the 
elements in a universe in an input data stream. The following result (due to ft30[l ) provides a 
space efficient algorithm for that job. 

Theorem 2. Given an insertion only stream of length n over a universe of size m, there is a 
deterministic one pass algorithm to find the frequencies of all the items in the stream within an 
additive approximation of en using O (minji (logm + logn) ,mlogn}) bits of memory, for every 
£ > 0 . 

Proof The 0(7 (logm + log n) ) space algorithm is due to [30]. On the other hand, notice that 
with space O (m log n), we can exactly count the frequency of every element, even in the turnstile 
model of stream, by simply keeping an array of length m (indexed by ids of the elements from 
the universe) each entry of which is capable of storing integers up to n. □ 

We now describe streaming algorithms for the (e , £)-Winner Determination problem for var¬ 
ious voting rules. The general idea is to sample certain number of votes uniformly at random 
from the stream of votes using the algorithm of Observation [lj and generate another stream of 
elements over some different universe. The number of votes sampled and the universe of the 
stream generated depend on the specific voting rule we are considering. After that, we approx¬ 
imately calculate the frequencies of the elements in the generated stream using Theorem [2j 
For simplicity, we assume that the number of votes in known in advance up to some constant 
factor (only to be able to apply Observation [1]) . We will see in Section I3TT1 how to get rid of this 
assumption, without affecting space complexity of any of the algorithms much. We begin with 
the /.--approval and A:-veto voting rules below. 
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Theorem 3. Assume that the number of votes is known to be within [c\n, C 2 n] for 
some constants c\ and 02 in advance. Then there is a one pass algorithm for 
the (e, <5)-Winner Determination problem for the k-approval voting rule that uses 
O (min{| (logm + log ^ + log log , m (log los ^ +1 ^ + log log ij} + log log bits of memory and 

for the k-veto voting rule that uses ()( min{^ (logm + log ^ + log log |) , m( log lQ g( m ~ fc+1 ) + 
log log |)} + log log n) bits of memory. 

Proof. Let us first consider the case of the ^-approval voting rule. We pick the current vote in 
the stream with probability p (the value of p will be decided later) independent of other votes. 
Suppose we sample £ many votes; let <S = {vt : * e [£]} be the set of votes sampled. From the set 
of sampled votes S, we generate a stream T over the universe C as follows. For i e [f], let the 
vote Vi be c\ > C 2 > ■■■ > c m . From the vote v t , we add k candidates ci, Ck in the stream T. We 
know that there is a £ = 0( 1 ° 6 ^ +1 - > log i) (and thus a corresponding p = D(-)) which ensures 

that for every candidate x e C, |^il _ < | w ith probability at least 1 - | [15], where s(-) 

and s(-) are the scores of the candidates in the input stream of votes and in S respectively. Now 
we count s(x ) for every candidate x e C within an additive approximation of y and the result 
follows from Theorem [2] (notice that the length of the stream T is k£). 

For the fc-veto voting rule, we approximately calculate the number of vetoes that every candidate 
gets using the same technique as above. However, for the k-veto voting rule, the corresponding 
bound for l is o( l °g( m ^ k+1 ) i Q g 1 ) which implies the result. □ 

By similar techniques, we have the following algorithm for the generalized plurality rule. 

Theorem 4. Assume that the number of votes is known to be within [cin, C 2 n]for any constants c\ 
and C 2 in advance. Then there is a one pass algorithm for the (e, 5)—Winner Determination prob¬ 
lem for the generalized plurality voting rule that uses O ( 7 (logm + log ^ + log log |) + log log n) 
bits of memory. 

Proof. We sample £ = ()(fj log A) many votes uniformly at random from the input stream of 
votes using the technique used in the proof of Theorem [3j For every candidate, we count both 
the number of approvals and disapprovals that it gets within an additive approximation of 
which is enough to get an e-winner. Now the space complexity follows form Theorem [2] □ 


We generalize Theorem [3] to the class of scoring rules next. We need the following result in the 
subsequent proof which is due to [15], 


Lemma 1. Let a = (ai, ■■■, a m ) be an arbitrary score vector and w the winner of an a-election £. 
Let x be any candidate which is not a e-winner. Then, s(w) - s(x) > a\en. 


With Lemma Q] at hand, we now present the algorithm for the scoring rules. 

Theorem 5. Assume that the number of votes is known to be within [cin,C 2 n] for any con¬ 
stants ci and C 2 in advance. Let a = (ai be a score vector such that ai > 0 for ev¬ 
ery i e \m\. Then there is a one pass algorithm for the (e,<5)-WiNNER Determination prob¬ 
lem for the a-scoring rule that uses O 1 (log log m + log 7 + log log |) + log log nj, which is 

0 (in (log log m + log 7 + log log |) + log log ?r), bits of memoiy. 

Proof. Let a = (aa, ■■■, a m ) be an arbitrary score vector with a; > 0 for every i e \m\. We define 
a'i = ym* (which is in [0,1]), for every i e \m\. Since scoring rules remain same even if we 

multiply every with any positive constant A, the score vectors a and a' correspond to same 
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voting rule. We pick the current vote in the stream with probability p (the value of p will be 
decided later) independent of other votes. Suppose we sample £ many votes; let S = {vi : i e [f]} 
be the set of votes sampled. For i e \£\, let the vote ty be c\ > C 2 > ••• > c m . We pick the candidate 
Ci from the vote v, with probability a' and define it to be a,. We compute the frequencies of the 
candidates in the stream S = {a; : i e [(]} within an additive factor of e'n, where e' = |. For 
every candidate x e C, let s(x) be the a'-score of the candidate x in the input stream of votes 
and s(x) be j times the a'-score of the candidate x in the sampled votes S. We know that there 
exists an £ = 0(pz log y) (and thus a corresponding p = f 2 (^)) which ensures that, for every 


candidate x e C, |s(rc) - s(x)| < a[e r n with probability at least 1 - | [15]. Let s(x ) be | times 


the frequency of the candidate x e C in the stream S. We now prove the following claim from 
which the result follows immediately. 


Claim 1. 


Pr[Vx e C, |s(cc) 


s(x)| < OL X e'n\ > 1 - - 


Proof. For every candidate xeC and every i e [£}, we define a random variable X,(x) to be 1 
if di = x and 0 otherwise. Then, s(x) = f Xfx). We have, E[s(x)] = s(x). Now using 
Chernoff bound from Theorem [H we have the following: 


in 

Pr[|s(x) - s(x)| > a^e'n] = Pr[|— ^ Xi(x ) - s(x)| > a^e'n] 


ie[t] 


= Pr[| Y, 




a. 


a^n 


r e 2 ainf, 
e 2 f 

< 2exp{- —} 

The fourth inequality follows from the fact that s(x) < o' n for every candidate x e C. Now we 
use the union bound to get the following. 


e 2 £ 5 

Pr[Vx e C, |s(x) - s(®)| < a^e'n] > 1 - Y 2exp{-} >1 — 

xeC 3 2 


The second inequality follows from an appropriate choice of £ = C)(fj log y ). □ 

We estimate the frequency of every candidate in S within an additive approximation ratio of 
o' zt and output the candidate w with maximum estimated frequency as the winner of the 
election. The candidate w is an e- winner (follows from Lemma[l]) with probability at least 1-5 
(follows from Claim [TJ) . The space complexity of this algorithm follows from Theorem [2] (since 
J_ _ fWu » < rasi. = m ) and Observation [l] □ 

We present next the streaming algorithm for the approval voting rule. It is again obtained by 
running a frequency estimation algorithm on samples from a stream. 

Theorem 6. Assume that the number of votes is known to be within [cin, C 2 n] in advance, for some 
constants c\ and C 2 . Then there is a one pass algorithm for the (e,5)-WiNNER Determination 
problem for the approval voting rule that uses O (m (log log m + log | + log log ^) + log log n) bits 
of memoiy. 
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Proof. We sample £ many votes using the algorithm described in Observation 0 and technique 
described in the proof of Theorem 0 The total number of approvals in those sampled votes 
is at most ml and we estimate the number of approvals that every candidate receives within 
an additive approximation of y. The result now follows from the upper bound on l iH and 
Theorem 0 □ 


Now we move on to maximin, Copeland, Bucklin, and plurality with run off voting rules. We 
provide two algorithms for these voting rules, which trade off between the number of candidates 
m and the approximation factor e. The algorithm in Theorem 0 below, which has better space 
complexity when \ is small compared to m, simply stores all the sampled votes. 


Theorem 7. Assume that the number of votes is known to he within [cin,C 2 n\ in advance, 
for some constants c± and 02 ■ Then there is a one pass algorithm for the (e,5)-WiNNER 
Determination problem for the maximin, Bucklin, and plurality with run off voting rules 


that use O 


O 


( m .o s yo g l + log log n) bi 
| — '' ’’ ? + log log nl bits of memory. 


bits of memory and for the Copeland voting rule that uses 


Proof We sample t many votes from the input stream of votes uniformly at random and simply 
store all of them. Notice that we can store a vote using space 0(m log m). The result now 
follows from the upper bound on l iflal and Observation [0 □ 


Next we consider the case when \ is large compared to m. 

Theorem 8. Assume that the number of votes is known to be within [cin,C 2 n] in advance, for 
some constants c\ and c 2 . Then there is a one pass algorithm for the (e, <5 )-Winner Determina¬ 
tion problem for the maximin, Copeland, Bucklin, and plurality with runoff voting rules that uses 
O (m 2 (log log m + log + log log i) + log log n) bits of memory. 


Proof. For each voting rule mentioned in the statement, we sample l many votes S - {vi : i e [(]} 
uniformly at random from the input stream of votes using the algorithm used in Observation 0 
and the technique used in the proof of Theorem 0 From S, we generate another stream S of 
elements belonging to a different universe U (which depends on the voting rule under consid¬ 
eration). Finally, we calculate the frequencies of the elements of S, using Theorem 0 within 
an additive approximation of y for maximin, Bucklin, and plurality with runoff voting rules 


and 


ei 


2 log m 


for the Copeland voting rule. The difference of approximation factor is due to [15], 


( log —— \ 

i s 1 for maximin, Bucklin, and plurality with run off voting rules and 


£ 


O ^ lo T, - j for the Copeland voting rule ifish . This bounds on £ prove the result once we 

describe S and U. Below, we describe the stream S and the universe U for individual voting 
rules. Let the vote V{ be c\ > C 2 > ••• > c m . 


• maximin, Copeland: U - C x(. From the vote v t , we put ( c 3 , <y) in S for every j < k. 

• Bucklin: U = C x [m\. From the vote Vi , we put ( Cj,k ) in S for every j < k. 

• plurality with runoff: U = C xC. From the vote v ,, we put (cj.Cf) in S for every j < k and 

(ci, ci). In the plurality with runoff voting rule, we need to estimate the plurality score of 
every candidate which we do by estimating the frequencies of the elements of the (x,x) in 
S. We also need to estimate Ds(x,y) for every candidate x,y eC which we do by estimating 
the frequencies of the elements of the form (x,y). □ 
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3.1 Unknown stream length 


Now we consider the case when the number of voters is not known beforehand. The idea is to 
use reservoir sampling ([35]) along with approximate counting ([18|, |31]) to pick an element 
from the stream almost uniformly at random. The following result shows that we can do so in a 
space efficient manner. 


Theorem 9. (Theorem 7 of (191 ) Given an insertion only stream of length n (n is not known to 
the algorithm beforehand) over a universe of size m, there is a randomized one pass algorithm that 
outputs, with probability at least 1-5, the element at a random position X € [n] such that, for 
every i e [n], | Pr {X = i} - ^ | < £ using 0(log | + log 1 + log log n + log m ) bits of memory, for every 
e e (0,1] and 5 > 0. 


Recall that Theorem[2]only works for insertion only streams. However, as the stream progresses, 
the element chosen by Theorem [9] changes; so, we cannot invoke Misra-Gries to do frequency 
estimation on a set of samples given by Theorem [9l For streams with both insertions and 
deletions, we have the following result which is due to count-min sketch [12]. 


Theorem 10. Given a turnstile stream of length n over a universe of size m, there is a randomized 
one pass algorithm to find the frequencies of the items in the stream within an additive approxima¬ 
tion of en with probability at least 1-5 using O log(i) (logm + logn)^ bits of memory, for 
every e > 0 and 5 > 0. 


From Theoreml9l and [TOl and from the proofs of Theorem [2[|4] to [6] and [8j we get the following. 

Corollary 1. Assume that the number of votes n is not known beforehand. Then there is a one 
pass algorithm for the (e,5)-WiNNER Determination problem for k-approval, k-veto, generalized 
plurality, approval, maximin, Copeland, Bucklin, and plurality with run off voting rules that uses 
logm log | times more space than the corresponding algorithms when n is known beforehand upto 
a constant factor. 


Proof. We use reservoir sampling with approximate counting from Theorem [9] The resulting 
stream that we generate have both positive and negative updates (since in reservoir sampling, 
we sometimes replace an item we previously sampled). Now we approximately estimate the 
frequency of every item in the generated stream using Theorem [TOl □ 


Again from Theorem [7] and [9] we get the following result which provides a better space upper 
bound than Corollary[l]when the number of candidates m is large. 


Corollary 2. Assume that the number of votes n is not known beforehand. Then there is a one 
pass algorithm for the (e,5)-Winner Determination problem for the maximin, Bucklin, and 


plurality with run off voting rules that use O j 
Copeland voting rule that uses O ( mlog ^ llQ gT 


m log 2 m log 4 


+ log log n I bits of memory and for the 


+ log log n ) bits of memory. 


3.2 Sliding Window Model 

Suppose we want to compute an e-winner of the last n many votes in an infinite stream of 
votes for various voting rules. The following result shows that there is an algorithm, with space 
complexity same as Theorem [9j to sample a vote from the last n votes in a stream. 
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Theorem 11. fjk\l) Given an insertion only stream over a universe of size m, there is a randomized 
one pass algorithm that outputs, with probability at least 1-5, the element at a random position 
X from last n positions such that, for eveiy i e [n], | Pr{X = i} - ^ using 0(log | + log 7 + 

log log n + log m) bits of memory, for every e e (0,1 ] and 6 > 0 . 

Theorem [IT] immediately provides results same as Corollary Q] and [2] where n is the window 
size. 

4 Lower Bounds 

In this section, we prove space complexity lower bounds for the (s,<5)-Winner Determina¬ 
tion problem for various voting rules. We reduce certain communication problems to the 
(e, <5 )-Winner Determination problem for proving space complexity lower bounds. Let us 
first introduce those communication problems with necessary results. 

4.1 Communication Complexity 
Definition 4. (AUGMENTED-INDEXING mi tJ 

Let t and m be positive integers. Alice is given a string x = (x\,--,xt) e [m]*. Bob is given an 
integer i e [ t ] and ( xi,---,Xi-i ). Bob has to output Xi. 

The following communication complexity lower bound result is due to rilih by a simple exten¬ 
sion of the arguments of Bar-Yossef et al ifll. 

Lemma 2. 7^ -u,a 2 /(augmented-indexing,^) = D((l - 5)t\ogm) for any 6 < 1 - 
Also, we recall the multi-party version of the set-disjointness problem. 

Definition 5. (Disj^“J> 

We have t sets each a subset of [m\ We have t players and player i is holding the set 

X^ We are also given the promise that either Xi n X 3 = 0 /or every i t j or there exist an element 
y e [m] such that y e Xifor every i e [t] and (Xi \ {y}) n (Xj \ {y}) = 0 /or every i 4- j. The output 
Dis//° m * se fA/, X t ) is 1 if Xi n Xj = 0 for every i ± j and 0 else. 

Lemma 3 (Proved in 0,[s].). IZ^~ way (D 1 s j/™" w - se ) = L2(y), for any 5 e [0, 1) and t. 

The following communication problem is very useful for us. 

Definition 6. (MAX-suM mit j 

Alice is given a string x = (xi,X 2 ,--, xt) e [m]* of length t over universe [ m]. Bob is given another 
string y = (yi,y 2 ,-"->yt) 6 \ m Y of length t over the same universe \m\. The strings x and y is such 
that the index i that maximizes Xi + yi is unique. Bob has to output the index i e [t] which satisfies 

Xi + yi = max je[ (] {x 3 + y 3 }. 

We establish the following one way communication complexity lower bound for the MAX-suM m ^ 
problem by reducing it from the AuGMENTED-lNDEXlNG 2 ,tiogm problem. 

Lemma 4. 1Z 1 5 ^“^(MAX-suMm^) = Q(t logm), for every 5 < j. 
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Proof. We reduce the AUGMENTED-lNDEXlNG 2 ,ti 0 gm problem to MAX-sUM 8mjt+ i problem thereby 
proving the result. Let the inputs to Alice and Bob in the AuGMENTED-iNDEXiNG 2 ) ji 0 gm in¬ 
stance be (ai,a 2 ,-"! a tiogm) e {0,l} tlogm and (ai,---,Oj_i) respectively. The idea is to con¬ 
struct a corresponding instance of the MAX-SUM 8m;i+1 problem that outputs t + 1 if and 
only if a,i = 0. We achieve this as follows. Alice starts execution of the MAX-suMgm^+i 
protocol using the vector x = (xi,x 2 ,---,xt+i) e [8m] t+1 which is defined as follows: the 
binary representation of Xj is ( 0 , 0 ,O(j_]^i O g m+ i,O(j_]^i O g m+ 2 )®(j-i)iogm+ 3 !”'’®jiogm) 0 ) 2 > for 
every j e [t], and xt+i is 0. Bob participates in the MAX-suMg mit+ i protocol with the 
vector y = (yi,y 2 ,---,yt+i) e [8m] t+1 which is defined as follows. Let us define A = 
flogml- We define yj = 0, for every j i {A,f + 1}. The binary representation of y\ is 
(1)0, o ( ^_i )logm+1 , a (A _i )logm+2 , Oj_t, 1,0,0, 0,0,1) 2 . Let us define an integer T whose bi¬ 

nary representation is (0,0, a (A _ 1} i og m + i, O(a-i) iogm+ 2 , ” a-i- 1 ,0,1,1, -, 1) 2 . We define y t+1 to be 
T + y\. First notice that the output of the MAX-suMg mit+ i instance is either A or t + 1, by the 
construction of y. Now observe that if aj = 1 then, x\ > T and thus the output of the Max- 
suMs, n ,t+i instance should be A. On the other hand, if a; = 0 then, x\ < T and thus the output 
of the MAX-suMgm^+i instance should be t + 1 . □ 


Finally, we also consider the Greater-than problem. 


Definition 7. fGREATER-THAN„J 

Alice is given an integer x e [n] and Bob is given an integer y e \n\,y 4- x. Bob has to output 1 if 
x > y and 0 othenvise. 


The following result is due to f[29l 34]. We provide a simple proof of it that seems to be missing 
in the literature. 


Lemma 5. Tl] way ( Greater-than n ) = L2(log n), for every 5 <\. 


Proof We reduce the Augmented-indexing 2 ji ogn ] + i problem to the GREATER-THAN n problem 
thereby proving the result. Alice runs the GREATER-THAN n protocol with its input number whose 
representation in binary is a = (.Ti.T 2 "-a:[i 0 g ri ]l) 2 . Bob participates in the Greater-than„ pro¬ 
tocol with its input number whose representation in binary is b = (xix 2 "-Xj_il 0---0 ) 2 . 

([logn]-i+l) 0's 

Now X* = 1 if and only if a > b. □ 


4.2 Reductions 

4.2.1 The cases e - 0 and 5 = 0 

We begin with the problem where we have to find the winner (i.e., 0-winner) for a plurality 
election. Notice that, we can find the winner by exactly computing the plurality score of every 
candidate. This requires 0(m log n) bits of memory. We prove below that, when n is much larger 
than m, this space complexity is almost optimal even if we are allowed to use randomization, 
by reducing it from the MAX-suM n;m problem. This strengthens a similar result proved in Karp 
et al. ll 22 ll only for deterministic algorithms. 

Theorem 12. Any one pass (0, <5)-Winner Determination algorithm for the plurality and gen¬ 
eralized plurality election must use f)(mlog(?r/m)) bits of memory, for any 5 e [0, |). 

IV A similar proof appears in j23] but theirs gives a weaker lower bound. 
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Proof. We prove the result for (0, <5)-Winner Determination problem for the plurality elec¬ 
tion. This gives the result for the generalized plurality election since every plurality election is 
also a generalized plurality election. Consider the MAX-suM n m problem where Alice is given 
a string x = (xi,---,x m ) e [n] m and Bob is given another string y = (yi,--,y m ) e [n] m . The 
candidate set of our election is \m\. The votes would be such that the only winner will be 
the candidate i such that i e arg rnax J 6 r m ] {xj + }. Moreover, the winner would be known to 

Bob, thereby proving the result. Thus Bob can output Xj correctly whenever our (0, <J)-Winner 
Determination algorithm outputs correctly. Alice generates xj many plurality votes for the 
candidate j, for every j e \m\. Alice now sends the memory content to Bob. Bob resumes the 
run of the algorithm by generating yj many plurality votes for the candidate j, for every j e [rn]. 
The plurality score of candidate j is (xj + yj ) and thus the plurality winner will be a candidate 
i such that i e arg max ?e j m | {x 3 +yj}. Notice that the total number of votes is at most 2 mn. The 
result now follows from Lemma |4j □ 

For the case when m and n are comparable, the following result is stronger. We prove this by 
exhibiting a reduction from the 'D\sf™™ se problem. 

Theorem 13. Any one pass (0, 5)-Winner Determination algorithm for the plurality and gen¬ 
eralized plurality election must use D(min {m, n}) hits of memory, for any 6 e [ 0 , 1 ). 

Proof Suppose we have a one pass (0,<5 )-Winner Determination algorithm for the plurality 
election that uses s bits of memory. We will demonstrate a one-way three party protocol to 
compute DiSJ^g” 1156 function using 2s bits of communication thus proving the result. We have 
the candidate set \m+ 1]. The protocol is as follows. 

Player 1 starts running the one pass (0, £)-Winner Determination algorithm on the input 
X\ u {m + 1}. Once player 1 is done reading all its input, it sends its memory content to player 
2. This needs at most s bits of communication. Player 2 resumes the run of the algorithm with 
input X 2 u {m + 1} and sends its memory content to player 3. Again this needs at most s bits 
of communication. Player 3 resumes the run of the algorithm on input X 3 and output 1 if and 
only if the winner is m + 1 and 0 else. Notice that, if the X, n Xj = 0 for every i + j then, 
the only winner of the votes (X\,m + l,X 2 ,m + 1,X^) is the candidate m + 1 with a plurality 
score of two. On the other hand, if there exist an element y e [m] such that y e Xj for every 
i € [t] and (Xj \ {y}) n (Xj \ {y}) = 0 for every i * j then, the only winner of the votes 
(Xi, m + 1, X 2 ,m + 1, X 3 ) is the candidate y with a plurality score of three. 

The number of candidates in the election above is m + 1 and the number of votes n is |Xi| + 
IX 2 1 + IX 3 I + 2 (m + 1) = @(m). This gives a space complexity lower bound of f2(min{m, n}). □ 

Theorem [T2l and [T3l give space complexity lower bounds for the case e = 0 . Next, we consider 
the other extreme case: deterministically find an e-winner, corresponding to 5 = 0. 

Theorem 14. Assume e < Then any one pass (e,0)-WiNNER Determination algorithm for 
the plurality election must use fi(logn) bits ofmemoiy, even if the number of voters is known up 
to a factor of 2 and the number of candidates is only 2. The same applies for generalized plurality, 
scoring rules, maximin, Copeland, Bucklin, and plurality with run off voting rules. 

Proof. For the sake of contradiction, we assume that the number of possible memory contents 
of the algorithm is o(n), since otherwise the algorithm uses D(logn) space and we have nothing 
to prove. Our candidate set is {0,1}. We will generate two vote streams, say R\ and R 2 , in such 
a way that the final state of the algorithm would be same; however e-winner would be different 
for the two streams thus providing the contradiction we are looking for. 
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Let so be the starting state of the algorithm. Consider the stream of votes for 1 and let the 
algorithm repeats its state for the first time after reading i many 1 votes. Let the state of the 
algorithm after reading i th 1 vote be same as the state the algorithm was after it read j th 1 
vote. Let us call p = i- j. Clearly p = o(n). Then there exist 61,62 = o(n) such that the state 
the algorithm will be after reading j- 6 \ many votes for 1 is same as the state it will be after 
reading + 62 many votes for 1. Let R\ be the stream of j - many votes for 1 followed 
by 77 many votes for 0. Let Il-> be the stream of ^ + 62 many votes for 1 followed by ^ many 
votes for 0. By construction the output of the algorithm is same for both the streams R\ and R 2 . 
However, candidate 1 is only e-winner in R\ and candidate 0 is only e-winner in R 2 . 

For elections with two candidates, scoring rules, maximin, Copeland, Bucklin, and plurality 
with run off voting rules are same as the plurality voting rule. □ 

4.2.2 Lower Bounds for Approximate and Randomized algorithms 

Now we move on and show space complexity lower bounds for general (e,<J)-WiNNER Deter¬ 
mination problem for various voting rules. The observation below immediately follows from 
the fact that the algorithm has to output a candidate as an e-winner. 

Observation 2. Every (e, <5)-Winner Determination algorithm, for all the voting rules consid¬ 
ered in this paper, needs D(logm) bits of memory. 

We show next a space complexity lower bound of log 7 ) bits for the (e, 5 )-Winner Deter¬ 
mination problem for various voting rules. 

Theorem 15. Suppose the number of candidates mis at least 7 . Any one pass (e, <5)-Winner De¬ 
termination algorithm for approval, k-approval, for k = 0(m x ) for every A e [0,1), generalized 
plurality, Borda, maximin, Copeland, and plurality with run off elections must use D((l-5)i log -) 
bits of memory, even when the number of votes are exactly known beforehand, for every 1 - 5 > y. 

Proof. We will show that, when m > d, we need L2(^|log^) bits of memory for solv¬ 
ing the (^> 5)—Winner Determination problem, thereby proving the result. Consider the 
Augmented-indexinGi/^j i/^ problem where Alice is given a string x = ( xi,X 2 ,---,xy € 
[i Is/e] 1 ^ and Bob is given an integer i e [Vv^] and (xi,---,Xj-i). The candidate set of the 
election, that we generate, is [ fs/i] x [ f ,A]- The overview of the technique is as follows: Alice 
generates a stream of votes and runs the algorithm, then sends the memory content to Bob, and 
Bob resumes the run of the algorithm with another stream of votes (both the streams of votes 
depend on the voting rule under consideration) in such a way that the only ^-winner will be 
the candidate (xi,i). Thus Bob can output Xi correctly if and only if the (\/s/ 8 ,<f)-WiNNER De¬ 
termination algorithm outputs correctly. Now the result follows from Lemma |2j The elections 
for specific voting rules are as follows. Let n be the number of votes. 

• ^-approval for k = 0(m x ) for every A e [0,1), approval, and generalized plurality: It is 

enough to prove the result for the k- approval voting rule for k = 0(m x ) for every A e [0,1), 
since every fc-approval election is also an approval election. For k = 1, we get the result 
for the plurality voting rule and thus for the generalized plurality voting rule, since every 
plurality election is also a generalized plurality election. 

- Case 1: k < \/rnr. Alice generates a stream of | votes in such a way that the /.r-approval 
score of every candidate in {( Xj,j) : j e [Vv^]} is least [fc N /en/ 2 j and the fc-approval 
score of any other candidate is 0. Alice now sends the memory content of the algorithm 
to Bob. Bob resumes the run of the algorithm by generating another stream of n/2 votes 
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in such a way that the ^-approval score of every candidate in {( j, i) : j e [ f X -\} is at least 
[ky/en/ 2j and the A;-approval score of any other candidate is 0. The score of the candidate 
(xi,i) is at least [ky/en\ where as the score of every other candidate is at most \ky/en/2]. 
Hence the only v'e/s-winner is (x t . i). 


- Case 2: k > y/rn and k = 0(m A ) for any A e [0.5,1): Alice generates a stream of ^ votes 
in such a way that the /c-approval score of every candidate in {(xj,j) : j € [-!=]} is at least 

| and the fc-approval score of any other candidate is at most \{k - -^)n/ ^-^=(-d= - 1 ) j], 


which is at most | for sufficiently small constant £ (depending on A). Alice now 

sends the memory content of the algorithm to Bob. Bob resumes the run of the algorithm 
by generating another stream of votes in such a way that the ^-approval score of every 
candidate in {(j,i) ■ j e [-[=]} is at least % and the /c-approval score of any other candidate 


is Kfe~ 7g) ^ ( l_ 1) 


]. In this case also the only ^-winner is 


• Borda, Bucklin: Alice generates a stream of | votes where the candidates in {(xg,£),£ e 
[ Vn/e] } are uniformly distributed in top 1 /yE positions of the votes and the rest of the candi¬ 
dates are uniformly distributed in bottom i/e - 1 /y/e positions of the votes. Alice now sends 
the memory content to Bob and Bob resumes the run of the algorithm by generating another 
stream of n/2 votes where the candidates in {(£,i),£ e [Vv^]} are uniformly distributed in 
top t/v'e positions of the votes and the rest of the candidates are uniformly distributed in bot¬ 
tom t/e - t/^/e positions of the votes. The Borda score of the candidate (xi, i ) is ( 1 / e - 1/2 y/E)n 
whereas the Borda score of every other candidate is at most ( 1 / 2 e - l /4^)n. Hence, the only 
vA/«-winner for the Borda voting rule is (:c,;, i), since each vote change can reduce or increase 
the Borda score of any candidate by at most 1/e. 

The candidate (xi,i) is ranked within top 2 /3yfe positions in - n /‘3 many votes, whereas any 
other candidate is ranked within top 2 /3^e positions in at most «/:; many votes. Hence the 
only ^/s-winner for the Bucklin voting rule is (x*, i). 


• Any Condorcet consistent voting rule, Plurality with runoff: Let us define X = {(xg,£) ■ 

£ e [VVE]}, Y = [t/VE] x [i/yi] \ X. Suppose X and Y are arbitrary but fixed ordering of 
the candidates in X and Y respectively. For every £ e [7%/e], Alice generates y/en/ 4 votes 

of the form (xe,£) > X \ {(xe,£)} > Y and another y/en/ 4 votes of the form X \ (xe,£) > 
(xe,£) > Y , where X is the reverse order of X. Alice now sends the memory content to 
Bob. Let us define A = {(£,i) ■ £ e [ 1 /v / e]} and B = \}/y/e\ x [Vv^] x A. Suppose A and B are 
arbitrary but fixed ordering of A and B respectively. Bob resumes the run of the algorithm 

by generating another v / e ri / 4 votes of the form (£,i) > A \ (£,i) > B and another y/en /4 votes 
of the form A \ (£, i ) > (£, i) > B for every £ e [i/v^], where A is the reverse order of A . The 
candidate (xi, i) defeats every other candidate in pairwise election by a margin of at least j. 
Also the plurality score of the candidate (xi,i) is more than the plurality score of every other 
candidate by at least y/en. Hence the only v^/s-winner is □ 


We can prove a space lower bound of Q(me log 7 ) for one pass (£,<5)-Winner Determina¬ 
tion algorithms for Borda, Bucklin, Copeland, and maximin voting rules by reducing it from 
Augmented -iNDEXiNGi^ £ m in the proof of Theorem [L5l We summarize this observation below. 

Corollary 3. Suppose the number of candidates m is at least / Any one pass (£,<5)-Winner 
Determination algorithm for Borda, maximin, Copeland, and plurality with run off elections 
must use H(( 1 - <5)mlog^) bits of memory, even when the number of votes are exactly known 
beforehand, for every 1 - <5 > y. 
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For the A;-veto voting rule, we prove below, again by reducing from Augmented-indexing, a 
slightly weaker space complexity lower bound compared to the bounds of Theorem [L5l 

Theorem 16. Suppose the number of candidates m is at least f Any one pass (e,<5)-WiNNER 
Determination algorithm for the k-veto voting rule for k = 0(m x ), for eveiy A e [0,1), must use 
D(^-logi), for every constant p < 1 , bits of memory, even when the number of votes are exactly 
known beforehand, for every 1 - S > 

Proof We prove the result for (|,<5)-Winner Determination problem. Consider the 
Augmented-indexing_ i_i_ problem where the first player Alice is given a string 

> e! 1 £ 

while the second player Bob is given an integer i e [ 77 ] and Xj for every j < i. The candidate 
set of our election is [-^- 77 ] x [^]. The votes would be such that the only |-winner will be 
the candidate thereby proving the result. Thus Bob can output Xi correctly whenever 

our (e , (5)—Winner Determination algorithm outputs correctly. Alice generates a stream of ^ 
votes (assume n to be sufficiently large) in such a way that for every a, b e {(xj,j) : j e ^-} 
and x,y e [^pr] x [A.] s {( xj,j) : j e ^}, we have s(a) - s(x) > s(b ) - 1 < s(a) < s(b ) + 1 , 
and s(y ) - 1 < s(x) < s(y) + 1 , where s(-) is the number of vetoes that a candidate receives 
(which is always negative or zero). This is possible since k = 0(m x ) for A e [0,1). Alice now 
sends the memory content of the algorithm. Bob resumes the run of the algorithm by gen¬ 
erating another stream of ^ votes in such a way that for every a',b' e : z e 7 P 7 } and 

x',y' e [pM x [^j] \ {( z,i ) : 2 e ^4^}, we have s(a') - s(x') > ^, s(b') - 1 < s(a') < s(b') + 1 , 
and s(y') - 1 < s(x') < s(y') + 1. Now the score of (xi,i) is more than the score of every other 
candidate by at least Hence, the candidate (x*,i) is the unique —winner. □ 

For the ^-approval voting rule, we provide a stronger space complexity lower bound of 
0(7 log i), when the number of candidates m is at least 4j, by reducing from Augmented- 

INDEXING 1 k. 

£ ’ £ 

Theorem 17. Assume that the number of candidates m is at least 77 . Then any one pass 
Winner Determination algorithm for the k-approval voting rule must use H(| log ^) bits of 
memory. 

Proof We prove the result for (|,<5)-Winner Determination problem. Consider the 

k 

Augmented-indexing 1 k problem where Alice is given {x\,---,xk) € [ 7 ]^ and Bob is given 
(x\, •••, 1 ). We will create a /.’-approval election in such a way that the |-winner will reveal x, 

to Bob. The candidate set of our election is [^] x [ 7 ]. For every j <e [k], Alice generates ^ many 
votes approving candidates in {(x kU _ 1)+1 ,k(j - 1 ) + 1), (x fc(j _i )+2 , k(j - 1 ) + 2 ),■■■, (x kj ,kj)}. 
Alice now sends the memory content to Bob. Let X - {(j,i) : j e [4]}- If k < ^ then, Bob 
generates ^ votes in such a way that every candidate in X gets at least many approvals and 
the candidates in [ 7 ] x [ 7 ] \ X does not get any approval from the votes that Bob generates. 
Now, the ^-approval score of the candidate ( Xi,i ) is at least (k + l)4y, whereas every other 
candidate gets at most many approvals. Hence, (xj,i) is the unique |-winner. If k > - 
then, Bob generates ^ votes in such a way that every candidate in X gets 7 many approvals 
and every candidate in [i] x [ 7 ] \ X gets at most (k - fc/e a -i/e -I s - 2 man y approvals from 
the votes that Bob generates. Here again the A'-approval score of the candidate (x l . i) is at least 
(1 + e)4, where as the A;-approval score of every other candidate is at most Hence, (xi,i) is 
the unique |-winner. □ 
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For the generalized plurality voting rule, we provide a f2(-^=log m) space complexity lower 

bound, again by reducing from Augmented- iNDEXiNG m j_. This bound is better than the lower 

_ ’ v? 

bound of Theorem[l5lwhen m is exponentially larger compared to 7. 

Theorem 18. Suppose the number of candidates m is at least 4=. Any one pass (e,J)-WiNNER 
Determination algorithm for the generalized plurality rule mustuse fi(-^=log m) bits of memory, 
for every 1 - <5 > ^. 

Proof We prove the result for (|,<5 )-Winner Determination problem. Consider the 

1 

Augmented- iNDEXiNG m j_ problem where Alice is given a string x = e \m\'A and 

’ \A s/r 

Bob is given an integer i e [-^= ] and (x±, ■ ■ ■, Xi -±). The candidate set of our election is [m] x [-1= ]. 

The votes would be such that the only |-winner will be the candidate (xi,i), thereby proving 
the result. Thus Bob can output rr* correctly whenever our (|, 5 )-Winner Determination al¬ 
gorithm outputs correctly Alice generates (-^= - j)en many approvals for candidate (xj.j), for 

every j < 4=. Alice now sends the memory content of the algorithm. Bob resumes the run of 
the algorithm by generating (4= - j)en many approvals for candidate (xj.j), for every j < i. 
Notice that, the only |-winner is the candidate (xi,i). Now the space complexity lower bound 
follows from Lemma El □ 

The space complexity lower bound in Theorem [15] for the plurality voting rule matches with 
the upper bound of Theorem [3] when \ < m < For the case when m < 7, we now show 

a matching space complexity lower bound for the plurality voting rule. We prove this result by 

exhibiting a reduction from the Max- sum 1 m problem. 

£ ’ 

Theorem 19. Assume that the number of candidates m is at most 7. Then any one pass (s,S)- 
Winner Determination algorithm for the plurality, generalized plurality, approval, k-approval 
for k = 0(m x ), for any A € [0,1), maximin, Copeland, Bucklin, plurality with run off voting rules 
must use log 7 ) bits of memory. 

Proof First, let us prove the result for the plurality voting rule. Suppose we have a one pass 
(e, d)—W inner Determination algorithm for the plurality election which uses s(n,e) bits of 

memory. Consider the communication problem Max-sum 1 . Let the inputs to Alice and Bob 

£ ’ 

in the Max-sum i m instance be x = (xi,x 2 ,—,x m ) e [^] m and y = (yi,y 2 ,—,2/m) e [y] m respec¬ 
tively. The candidate set of the election is [m\. Alice generates Xi many plurality vote for the 
candidate i, for every i e \m\. Alice now sends the memory content of the algorithm to Bob. Bob 
resumes the run of the algorithm by generating y % many plurality votes for the candidate i, for 
every i € [m]. Suppose i = argmax je [ m ]{xj + yf\ (recall from Definition [ 6 ] that there exist unique 
element i that maximizes x^ + yf) and t ± argma Xj £ [ m ]{xj + yj}. Then we have the following: 

(xj + yi) - (x e + y e ) e e 2 
T,je[m\( x j +Vj) ~ 2m ~ 2 

The first inequality follows from the fact that (xi + yi) - (xe + yp) > 1 and Ey e [ m ] x j + Uj S 

The second inequality follows from the assumption that m < 7. Hence, whenever the (y, S)- 

2 

Winner Determination algorithm outputs an ^--winner. Bob also outputs correctly in the 
Max-sum 1 m problem instance. 
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For the other voting rules, the idea is the same as above: we will generate votes in such a way 
that ensures that the candidate i wins if i = argrnax J£ r m i{xj + y 3 ) by a margin of at least one. 
Below, we only specify the votes to be generated for other voting rules. 

• Generalized plurality, approval: Follows immediately from the fact that every plurality 
election is a valid generalized plurality and approval election too. 

• fc-approval for k = 0(m x ), for any A € [0,1): Alice (respectively Bob) generates Xi (respec¬ 
tively Hi) many votes such that candidate i gets x t many approvals and the rest (k - 1 )x r 
many approvals are equally distributed among other m- 1 candidates. 

• Borda, maximin, Copeland, Bucklin, plurality with run off: Alice (respectively Bob) gen¬ 

erates Xi (respectively y,f many votes of the form i > C-, and another x t (respectively yf) 
many votes of the form i > C-i, where C- t is an arbitrary but fixed order of the candidates in 
C \ {?'} and C-i is the reverse order of C-i. □ 

Now we show space complexity lower bounds that depend on the number of votes n. The result 
below is obtained by reducing from the Greater-than„ problem. The lower bound is tight in 
the number of votes n. 

Theorem 20. Any one pass (e, (5 )—Winner Determination algorithm for the plurality voting 
rule must use fl(loglogn) memoiy hits, even if the number of candidates is only 2 , for eveiy 6 < \. 
The same applies for generalized plurality, scoring rules, maximin, Copeland, Bucklin, and plurality 
with run off voting rules. 

Proof. Suppose we have a one pass (e, <5)-Winner Determination algorithm for the plurality 
election which uses s(n) bits of space. Using this algorithm, we will show a communication pro¬ 
tocol for the Greater-than„ problem whose communication coplexity is s( 2 n ) thereby proving 
the statement. The candidate set is {0,1}. Alice generates a stream of 2 X many plurality votes 
for the candidate 1. Alice now sends the memory content of the algorithm. Bob resumes the 
run of the algorithm by generating a stream of 2 V many plurality votes for the candidate 0. If 
x > y then the candidate 1 is the only e-winner; whereas if x < y then the candidate 0 is the only 
e-winner. 

For elections with two candidates, generalized plurality, scoring rules, maximin, Copeland, 
Bucklin, and plurality with run off voting rules are same as the plurality voting rule. □ 


5 Conclusions and Future Work 

In this work, we studied the space complexity for determining approximate winners in the set¬ 
ting where votes are inserted continually into a data stream. We showed that allowing random¬ 
ization and approximation indeed allows for much more space-efficient algorithms. Moreover, 
our bounds are tight in certain parameter ranges. 

The most immediate open question is to close the gaps between the upper and lower bounds. 
In particular, even for plurality, the dependence on m and £ is not tight when m is large. Also, 
for the other voting rules, are there more sophisticated algorithms which improve our upper 
bounds? In a different vein, it may be interesting to implement these streaming algorithms for 
use in practice (say, for participatory democracy experiments or for online social networks) and 
investigate how they perform. Finally, instead of having the algorithm be passive, could we 
improve performance by having the algorithm actively query the voters as they appear in the 
stream? 
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